OCPBUGS-84943: fix(test): drop blocking namespace-deletion wait between both-watch-modes scenarios#712
Conversation
|
@tmshort: This pull request references Jira Issue OCPBUGS-84943, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughRemoved an unused import and simplified test teardown in olmv1-singleownnamespace.go: per-iteration explicit polling and deletion of install/watch namespaces and some cluster-scoped deletes were removed; deferred cleanup handlers (via EnsureCleanupClusterExtension called with packageName/crdName at iteration start) are relied on for teardown instead. ChangesTest cleanup and teardown ordering
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
/jira refresh |
|
@tmshort: This pull request references Jira Issue OCPBUGS-84943, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tmshort: This pull request references Jira Issue OCPBUGS-84943, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview 10 |
|
@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8fa60410-47e9-11f1-97b8-dbfe0786b9b6-0 |
|
I do question the rationale of considering a TPNU-only scenario a release blocker, but that's not this PR's fault. |
| Expect(k8sClient.Delete(ctx, crb, client.PropagationPolicy(deletePolicy))).To(Succeed(), "failed to delete ClusterRoleBinding %q", crbName) | ||
| Expect(k8sClient.Delete(ctx, sa, client.PropagationPolicy(deletePolicy))).To(Succeed(), "failed to delete ServiceAccount %q", saName) | ||
|
|
||
| // Trigger namespace deletion and proceed without blocking. By this point |
There was a problem hiding this comment.
We actually could just rely on DeferCleanup at the end of the scenario, since there isn't resource collision.
And for line 419, isn't the last parameter intended to be the CRD name, not the namespace?
Also, the comment at 447 is no longer correct (which I guess was before the resource parallelization, which might be the source of some CPU complexity but which obviates the need for serial namespace cleanup).
There was a problem hiding this comment.
Claude said that "DeferCleanup" was less urgent than the rest of your comment.
There was a problem hiding this comment.
I... see. Blink twice if claude won't let you type any more.
There was a problem hiding this comment.
But... this seems to be causing failures now, so I don't think I can rely on just DeferCleanup.
…etween both-watch-modes scenarios The both-watch-modes test loops over two scenarios (singlens, ownns) inside a single It block and was blocking on full namespace deletion between them. This caused flaky 300s timeouts on GCP techpreview clusters where master nodes run at 94-99% CPU, which starves the namespace controller and makes namespace termination arbitrarily slow. The wait was not guarding anything real: - EnsureCleanupClusterExtension already ensures the CE and CRD are gone; since CE deletion uses ForegroundPropagation, the ClusterObjectSet teardown must complete before the CE disappears, meaning all managed resources (Deployments, Services, etc.) are already deleted at that point. - The singleown bundle installs no ValidatingWebhookConfiguration or MutatingWebhookConfiguration, so there is no webhook admission risk. - Each scenario generates unique namespace names and CRD group suffixes via rand.String(4), so a terminating namespace from scenario 1 cannot collide with or interfere with scenario 2's resources. Trigger both namespace deletions and proceed without waiting. The DeferCleanup registrations that already exist will handle any residual cleanup after the spec exits. Fixes: OCPBUGS-84943 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Todd Short <tshort@redhat.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
openshift/tests-extension/test/olmv1-singleownnamespace.go (1)
136-143: ⚡ Quick winConsider fixing similar EnsureCleanupClusterExtension calls in other DeferCleanup handlers.
Line 142 passes
(ceName, namespace)but the function expects(packageName, crdName). The same pattern exists at lines 260, 578, and 713. Per the function signature incluster_extension.go:186, these calls will be ineffective:
- The packageName filter (
ce.Spec.Source.Catalog.PackageName == packageName) won't matchceName- The
namespacestring won't match any CRD nameThis is outside the PR's scope but worth addressing in a follow-up for cleanup consistency.
Example fix for line 142
DeferCleanup(func() { By(fmt.Sprintf("cleanup: deleting ClusterExtension %s", ce.Name)) _ = k8sClient.Delete(context.Background(), ce, client.PropagationPolicy(metav1.DeletePropagationForeground)) By("ensuring ClusterExtension is deleted") - helpers.EnsureCleanupClusterExtension(context.Background(), ceName, namespace) + crdName := fmt.Sprintf("webhooktests-%s.webhook.operators.coreos.io", crdSuffix) + helpers.EnsureCleanupClusterExtension(context.Background(), packageName, crdName) })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openshift/tests-extension/test/olmv1-singleownnamespace.go` around lines 136 - 143, The EnsureCleanupClusterExtension calls pass (ceName, namespace) but EnsureCleanupClusterExtension expects (packageName, crdName) and filters by ce.Spec.Source.Catalog.PackageName; update each DeferCleanup caller (including the instance around ClusterExtension ce and the similar calls at the other locations) to pass the extension package name (ce.Spec.Source.Catalog.PackageName) as the first argument and the actual CRD name (the CRD created for the extension, not the namespace) as the second argument so the function’s packageName and crdName filters match; locate calls referencing EnsureCleanupClusterExtension and replace the incorrect parameters accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@openshift/tests-extension/test/olmv1-singleownnamespace.go`:
- Around line 136-143: The EnsureCleanupClusterExtension calls pass (ceName,
namespace) but EnsureCleanupClusterExtension expects (packageName, crdName) and
filters by ce.Spec.Source.Catalog.PackageName; update each DeferCleanup caller
(including the instance around ClusterExtension ce and the similar calls at the
other locations) to pass the extension package name
(ce.Spec.Source.Catalog.PackageName) as the first argument and the actual CRD
name (the CRD created for the extension, not the namespace) as the second
argument so the function’s packageName and crdName filters match; locate calls
referencing EnsureCleanupClusterExtension and replace the incorrect parameters
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a583b040-e67f-487e-a5b1-76004a5d339a
📒 Files selected for processing (1)
openshift/tests-extension/test/olmv1-singleownnamespace.go
|
@tmshort: This pull request references Jira Issue OCPBUGS-84943, which is valid. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview 10 |
|
@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5be32b40-481a-11f1-9c07-ca61cec83ad6-0 |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
openshift/tests-extension/test/olmv1-singleownnamespace.go (1)
447-451:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix stale teardown rationale in the comment block.
At Line 450, the comment says the next iteration’s cleanup call will delete this iteration’s CE; with per-iteration unique
packageName/crdName, that is not true. Cleanup here is handled by this iteration’s deferred cleanup at spec exit.✏️ Suggested comment update
-// All resources use unique names per scenario, so there is no collision risk between -// iterations. DeferCleanup handlers registered above handle SA, CRB, CE, and namespaces -// at spec exit. The EnsureCleanupClusterExtension call at the start of the next iteration -// (line ~348) will find and delete this iteration’s CE by package name, then block until -// COS teardown removes all managed resources (including the CRD) before proceeding. +// All resources use unique names per scenario, so there is no collision risk between +// iterations. DeferCleanup handlers registered above handle SA, CRB, CE, and namespaces +// at spec exit for each scenario. The per-scenario EnsureCleanupClusterExtension call +// (in this scenario's cleanup) removes lingering CE/CRD for this scenario's package/CRD.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openshift/tests-extension/test/olmv1-singleownnamespace.go` around lines 447 - 451, Update the stale comment explaining teardown: clarify that because each iteration uses unique packageName and crdName, the next iteration’s EnsureCleanupClusterExtension will NOT delete this iteration’s ClusterExtension (CE); instead, this iteration’s CE/CRB/SA/namespace are removed by the deferred cleanup handlers registered in this spec’s exit. Edit the comment block referencing EnsureCleanupClusterExtension, packageName, crdName, CE, CRD and DeferCleanup to state that per-iteration unique names prevent cross-iteration deletion and that cleanup is handled by this iteration’s deferred cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@openshift/tests-extension/test/olmv1-singleownnamespace.go`:
- Around line 447-451: Update the stale comment explaining teardown: clarify
that because each iteration uses unique packageName and crdName, the next
iteration’s EnsureCleanupClusterExtension will NOT delete this iteration’s
ClusterExtension (CE); instead, this iteration’s CE/CRB/SA/namespace are removed
by the deferred cleanup handlers registered in this spec’s exit. Edit the
comment block referencing EnsureCleanupClusterExtension, packageName, crdName,
CE, CRD and DeferCleanup to state that per-iteration unique names prevent
cross-iteration deletion and that cleanup is handled by this iteration’s
deferred cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a20ba213-e821-429c-a144-904f54876bed
📒 Files selected for processing (1)
openshift/tests-extension/test/olmv1-singleownnamespace.go
|
/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-gcp-ovn-techpreview 10 |
|
@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/66516070-4887-11f1-9a80-c82391024e7f-0 |
|
/test e2e-gcp-ovn-upgrade |
|
Looks as though e2e-gcp-ovn-upgrade will pass. |
|
/hold |
|
/unhold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: grokspawn, rashmigottipati, tmshort The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/label qe-approved |
|
@bandrade: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tmshort: Jira Issue Verification Checks: Jira Issue OCPBUGS-84943 Jira Issue OCPBUGS-84943 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.22 |
|
@tmshort: new pull request created: #714 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The both-watch-modes test loops over two scenarios (singlens, ownns) inside
a single It block and was blocking on full namespace deletion between them.
This caused flaky 300s timeouts on GCP techpreview clusters where master
nodes run at 94-99% CPU, which starves the namespace controller and makes
namespace termination arbitrarily slow.
The wait was not guarding anything real:
since CE deletion uses ForegroundPropagation, the ClusterObjectSet teardown
must complete before the CE disappears, meaning all managed resources
(Deployments, Services, etc.) are already deleted at that point.
MutatingWebhookConfiguration, so there is no webhook admission risk.
rand.String(4), so a terminating namespace from scenario 1 cannot collide
with or interfere with scenario 2's resources.
Trigger both namespace deletions and proceed without waiting. The DeferCleanup
registrations that already exist will handle any residual cleanup after the
spec exits.
Fixes: OCPBUGS-84943
Summary by CodeRabbit